Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #71 rounding #72

Merged
merged 10 commits into from
Aug 22, 2023
Merged

Closes #71 rounding #72

merged 10 commits into from
Aug 22, 2023

Conversation

kaz462
Copy link
Collaborator

@kaz462 kaz462 commented Jul 25, 2023

No description provided.

Copy link
Collaborator

@manciniedoardo manciniedoardo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great article @kaz462 - I learned a lot about different ways to round 😄 I left some review comments.

posts/2023-07-24_rounding/rounding.qmd Outdated Show resolved Hide resolved
posts/2023-07-24_rounding/rounding.qmd Outdated Show resolved Hide resolved
posts/2023-07-24_rounding/rounding.qmd Outdated Show resolved Hide resolved
posts/2023-07-24_rounding/rounding.qmd Outdated Show resolved Hide resolved
posts/2023-07-24_rounding/rounding.qmd Outdated Show resolved Hide resolved
posts/2023-07-24_rounding/rounding.qmd Outdated Show resolved Hide resolved
repo_spec = "pharmaverse/blog",
name = long_slug
)
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a minor thing, and perhaps you don't want to do this, but I thought I would mention anyway - would it be worth adding a section looking a the bigger picture, and whether how far we should be trying to go to match SAS rounding? I'm thinking about the fact that if we are trying to move to R-based submissions, then some of this stuff should in a way be expected and accepted. My idea comes from this comment by @rossfarrugia .

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when moving from SAS to R submission, what kind of discrepancy are expected/accepted for rounding?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to add context to my comment before... when using same language for double programming QC often teams would strive for a 100% perfect comparison match, whereas when working in a multi-lingual world maybe it's ok to not perfectly match if differences can be explained in audit-ready QC evidence to explain differences are down to the use of different languages. e.g. a % in first line output with R is 1% and in QC with SAS it is 2%, and upon inspection the QC'er sees that the actual value is 1.5% rounded, hence just add a comment in QC evidence to explain such differences are considered acceptable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. a % in first line output with R is 1% and in QC with SAS it is 2%, and upon inspection the QC'er sees that the actual value is 1.5% rounded, hence just add a comment in QC evidence to explain such differences are considered acceptable.

This discrepancy is not caused by SAS/R, but by different rounding approaches, which I think should be based on the study SAP instead of the software. Should the rounding method be clarified first, then choose the corresponding functions in SAS/R?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd doubt most SAPs would go into such detail as nobody really ever questioned rounding approaches when we only used single language. It's a fair point though. I feel like a closing note such as "Note: with the differences in default behaviour across languages, you could consider your QC strategy and whether an acceptable level of fuzz in the electronic comparisons could be allowed for cases such as rounding when making comparisons between 2 codes written in different languages as long as this is documented. Alternatively you could document the exact rounding approach to be used in the SAP and then match this regardless of programming language used."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good summary, I will add it next week after the holiday.
The reason I prefer to align with the rounding method during QC is that when we accept the difference, it's easy to overlook the differences caused by other reasons., e.g., RConsortium/submissions-pilot3-adam#99

Thanks @rossfarrugia @clarkliming for your comments and discussion!

Copy link
Collaborator

@bms63 bms63 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

omg this rounding numbers graphic is amazing!!

@bms63
Copy link
Collaborator

bms63 commented Jul 30, 2023

CAn we increase the description here. Feels kind of empty
image

@bms63
Copy link
Collaborator

bms63 commented Jul 30, 2023

I think we might want to mention SAS's rounde function, which matches R's round function

@bms63
Copy link
Collaborator

bms63 commented Jul 30, 2023

It might be nice to give this PHUSE working group more of a shout out! A cross-industry initiative to document differences between programming languages.

image

@bms63
Copy link
Collaborator

bms63 commented Jul 30, 2023

SO I first encountered this issue when double programming in R for TLGs made in SAS. I think you experienced the same issue and I think that should be mentioned here to set the stage.

image

@bms63
Copy link
Collaborator

bms63 commented Jul 30, 2023

Maybe change this to Exploring Rounding Options. Bug Fix makes it seem like R is doing something wrong and SAS is doing it right.
image

@bms63
Copy link
Collaborator

bms63 commented Jul 30, 2023

The post just sort of ends abruptly - anyway we could tie it up with your current experiences/summary of the post?

This one is going to be awesome!!!

@kaz462
Copy link
Collaborator Author

kaz462 commented Aug 1, 2023

Maybe change this to Exploring Rounding Options. Bug Fix makes it seem like R is doing something wrong and SAS is doing it right. image

what do you think of changing "Bug Fix" to "Numerical precision issue"?
I also linked a SAS documentation on this topic: Numerical Accuracy in SAS Software

@kaz462
Copy link
Collaborator Author

kaz462 commented Aug 1, 2023

@manciniedoardo @bms63 Thanks so much for your review and comments!

I have limited knowledge on computational precision and floating-point, so I didn't go too deep on this topic, let me know if everything makes sense by referencing multiple resources.

@bms63 bms63 self-requested a review August 2, 2023 17:57
Copy link
Collaborator

@bms63 bms63 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think option 2 for the table would be helpful, but I'm okay if it doesn't happen.

The discussion on a clean compares between two languages is interesting. Maybe a follow up post?

Perhaps leave them with a question at the end of the post. How concerned should we be if a programs in R and and SAS are off ...

@kaz462
Copy link
Collaborator Author

kaz462 commented Aug 2, 2023

Perhaps leave them with a question at the end of the post. How concerned should we be if a programs in R and and SAS are off ...

Good idea on the open-ended question. The following example illustrate the "safe" options I mentioned in the last section doesn't work as expected, but I'm not sure if this will ever happen in our daily work.

In this example, a is a value slightly less than 1.5. So if we choose round half up approach with 0 decimal places, output 1 is expected, but because + sqrt(.Machine$double.eps) is used, we get 2 as the result.

> a <- 1.5 - 0.5*sqrt(.Machine$double.eps)
> janitor::round_half_up(a, digits = 0)
[1] 2

If it happens, maybe we could argue a and 1.5 are nearly equal by using all.equal() function from base R with default tolerance value (default tolerance = sqrt(.Machine$double.eps)

> all.equal(a, 1.5)
[1] TRUE

@manciniedoardo @rossfarrugia @bms63 depends on the situation, maybe discrepancies like this example can be accepted?
should I add this example in the last section?

After the fix:

```{r, message = FALSE}
# revised rounds half up

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personally, I don't think we should do this. this will change a lot of behaviors, like ut_round1(2436.84499999999999, 2) gives 2436.85 while round(2436.84499999999999, 2) gives 2436.84 (which is correct)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clarkliming Thanks for the review! your example is a similar case to the one mentioned above: a <- 1.5 - 0.5*sqrt(.Machine$double.eps)

The reason for this behavior is that 2436.845 - 2436.84499999999999 is less than sqrt(.Machine$double.eps), the adjustment in the function is not big enough to round it up. When we encounter this situation, maybe we could handle it by the following options -

  1. accept this value and explain: 2436.845 and 2436.84499999999999 are nearly equal by using all.equal() function from base R with default tolerance value (default tolerance = sqrt(.Machine$double.eps)

    • the output from SAS round(2436.84499999999999, 0.01) is also 2436.85
  2. choose a smaller value than sqrt(.Machine$double.eps) or remove it to adjust this function

How would you do "round half up" in R if we don't choose those functions (ut_round1, janitor::round_half_up, ...)?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this function will be used deeply inside other functions and large-scale used, so controlling the tolerance manually is not likely(which means that users can only apply one tolerance to all the numeric results obtained).

I will split your question into two: one is how do we deal with round half up, the other is how do we deal with rounding error.

My response to these two questions is the same: no action is needed.

For question 1, R is following IEEE 754, with default bankers rounding. What SAS provided seems reasonable, however, R is not wrong. This is just a different method. R does not need to copy SAS.

For question 2, about the accuracy loss in computation, I will also do nothing. This is an issue by design, and not possible to perfectly solve at the moment.

Both issues won't affect the interpretation of the result

So for me, introducing the fact that R and SAS are different in rounding should be sufficient.

Come back to the issue. The function ut_round1 tries to solve the rounding error arising from the design, but it did not. So it is important to also mention this point. (also, this issue is not possible to solve in the current design, this fix only fixes one part of the issue)

Copy link
Collaborator Author

@kaz462 kaz462 Aug 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not trying to claim that base R's round function is wrong or we should strictly mimic SAS's behavior, but to provide round half up options in R. Similar as the intention of providing rounding option in Tplyr, offering users more flexibility - there may be contexts where 'round half up' is required.

What do you think of using the following examples to highlight those round_half_up functions do not offer the same level of precision and accuracy as the base R round function?
Please feel free to update this post directly, or let me know how to clarify this point :)

> a <- 1.5 - 0.5*sqrt(.Machine$double.eps)
> b <- 1.5 - 0.5*.Machine$double.eps
> janitor::round_half_up(a)
[1] 2
> janitor::round_half_up(b)
[1] 2
> round(a)
[1] 1
# base round reaches the precision limit:
> round(b)
[1] 2

@kaz462 kaz462 requested a review from bms63 August 22, 2023 01:32
Copy link
Collaborator

@bms63 bms63 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all good for me! @rossfarrugia @StefanThoma any last concerns or can we merge in and publish?

@rossfarrugia
Copy link
Contributor

Fine from my side 👍

@bms63 bms63 merged commit c0a3c22 into main Aug 22, 2023
4 checks passed
@bms63 bms63 deleted the 71_rounding branch August 22, 2023 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants